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O ' Abstract 

^: 

pL|, Reversible or information-lossless circuits have applications in digital signal processing, com- 

. munication, computer graphics and cryptography. They are also a fundamental requirement in the 

^ ■ emerging field of quantum computation. We investigate the synthesis of reversible circuits that 

employ a minimum number of gates and contain no redundant input-output line-pairs (tempo- 



rary storage channels). We prove constructively that every even permutation can be implemented 
without temporary storage using NOT, CNOT and TOFFOLI gates. We describe an algorithm 
for the synthesis of optimal circuits and study the reversible functions on three wires, reporting 
the distribution of circuit sizes. Finally, in an application important to quantum computing, we 
synthesize oracle circuits for Grover's search algorithm, and show a significant improvement over 
a previously proposed synthesis algorithm. 
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1 Introduction 



In most computing tasks, the number of output bits is relatively small compared to the number of 
input bits. For example, in a decision problem, the output is only one bit (yes or no) and the input 
can be as large as desired. However, computational tasks in digital signal processing, communica- 
tion, computer graphics, and cryptography require that all of the information encoded in the input 
be preserved in the output. Some of those tasks are important enough to justify adding new mi- 
croprocessor instructions to the HP PA-RISC (MAX and MAX-2), Sun SPARC (VIS), PowerPC 
(AltiVec), IA-32 and IA-64 (MMX) instruction sets (THini. In particular, new bit-permutation 
instructions were shown to vastly improve performance of several standard algorithms, including 
matrix transposition and DES, as well as two recent cryptographic algorithms Twofish and Serpent 
IIT3I . Bit permutations are a special case of reversible functions, that is, functions that permute 
the set of possible input values. For example, the butterfly operation {x,y) —>■ {x + y,x — y) is 
reversible but is not a bit permutation. It is a key element of Fast Fourier Transform algorithms 
and has been used in application-specific Xtensa processors from Tensilica. One might expect 
to get further speed-ups by adding instructions to allow computation of an arbitrary reversible 
function. The problem of chaining such instructions together provides one motivation for study- 
ing reversible computation and reversible logic circuits, that is, logic circuits composed of gates 
computing reversible functions. 

Reversible circuits are also interesting because the loss of information associated with irre- 
versibility implies energy loss 0. Younis and Knight showed that some reversible circuits 
can be made asymptotically energy-lossless as their delay is allowed to grow arbitrarily large. 
Currently, energy losses due to irreversibility are dwarfed by the overall power dissipation, but 
this may change if power dissipation improves. In particular, reversibility is important for nan- 
otechnologies where switching devices with gain are difficult to build. 

Finally, reversible circuits can be viewed as a special case of quantum circuits because quan- 
tum evolution must be reversible fl4l . Classical (non-quantum) reversible gates are subject to the 
same "circuit rules," whether they operate on classical bits or quantum states. In fact, popular 
universal gate libraries for quantum computation often contain as subsets universal gate libraries 
for classical reversible computation. While the speed-ups which make quantum computing at- 
tractive are not available without purely quantum gates, logic synthesis for classical reversible 
circuits is a first step toward synthesis of quantum circuits. Moreover, algorithms for quantum 
communications and cryptography often do not have classical counterparts because they act on 
quantum states, even if their action in a given computational basis corresponds to classical re- 
versible functions on bit-strings. Another connection between classical and quantum computing 
comes from Grover's quantum search algorithm 16J. Circuits for Grover's algorithm contain large 
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parts consisting of NOT, CNOT and TOFFOLI gates only fl4ll . 

We review existing work on classical reversible circuits. Toffoli EHl gives constructions for an 
arbitrary reversible or irreversible function in terms of a certain gate library. However, his method 
makes use of a large number of temporary storage channels, i.e. input-output wire-pairs other 
than those on which the function is computed (also known as ancilla bits). Sasao and Kinoshita 
show that any conservative function (/(x) is conservative if x and f{x) always contain the same 
number of Is in their binary expansions) has an implementation with only three temporary storage 
channels using a certain fixed library of conservative gates, although no explicit construction is 
given lIT^ . Kemtopf uses exhaustive search methods to examine small-scale synthesis problems 
and related theoretical questions about reversible circuit synthesis (5). There has also been much 
recent work on synthesizing reversible circuits that implement non-reversible Boolean functions 
on some of their outputs, with the goal of providing the quantum phase shift operators needed by 
Grover's quantum search algorithm |I8|E||^. Some work on local optimization of such circuits 
via equivalences has also been done fill ISl. In a different direction, group theory has recently 
been employed as a tool to analyze reversible logic gates fl^ and investigate generators of the 
group of reversible gates ||5l. 

Our work pursues synthesis of optimal reversible circuits which can be implemented without 
temporary storage channels. In Section |3j we show by explicit construction that any reversible 
function which performs an even permutation on the input values can be synthesized using the 
CNTS (CNOT, NOT, TOFFOLI, and SWAP) gate hbrary and no temporary storage. An arbitrary 
(possibly odd) permutation requires at most one channel of temporary storage for implementation. 
By examining circuit equivalences among generalized CNOT gates, we derive a canonical form 
for CNT-circuits. In Section |3] we present synthesis algorithms for implementing any reversible 
function by an optimal circuit with gates from an arbitrary gate library. Besides branch-and- 
bound, we use a dynamic programming technique that exploits reversibility. While we use gate 
count as our cost function throughout, this method allows for many different cost functions to be 
used. Applications to quantum computing are examined in Section]?] 

2 Background 

In conventional (irreversible) circuit synthesis, one typically starts with a universal gate library 
and some specification of a Boolean function. The goal is to find a logic circuit that implements 
the Boolean function and minimizes a given cost metric, e.g., the number of gates or the circuit 
depth. At a high level, reversible circuit synthesis is just a special case in which no fanout is 
allowed and all gates must be reversible. 
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2.1 Reversible Gates and Circuits 



Definition 1 A gate is reversible if the (Boolean) function it computes is bijective. 

If arbitrary signals are allowed on the inputs, a necessary condition for reversibility is that the 
gate have the same number of input and output wires. If it has k input and output wires, it is called 
a. k X k gate, or a gate on k wires. We will think of the mth input wire and the mth output wire 
as really being the same wire. Many gates satisfying these conditions have been examined in the 
literature flSi . We will consider a specific set defined by Toffoh 120 J . 

Definition 2 A k-CNOT is a (k + \ ) x (k + \ ) gate. It leaves the first k inputs unchanged, and 
inverts the last iff all others are 1. The unchanged lines are referred to as control lines. 

Clearly the /:-CNOT gates are all reversible. The first three of these have special names. The 0- 
CNOTis just an inverter or NOT gate, and is denoted by N. It performs the operation (x) —>■ (x© 1), 
where © denotes XOR. The 1-CNOT, which performs the operation {y,x) — > {y,x®y) is referred 
to as a ControUed-NOT [7 1, or CNOT (C). The 2-CNOT is normally called a TOFFOLI (T) gate, 
and performs the operation {z,y,x) —^ {z,y,x(Byz)- We will also be using another reversible gate, 
called the SWAP (S) gate. It is a 2 x 2 gate which exchanges the inputs; that is, {x,y) — > {y,x). 
One reason for choosing these particular gates is that they appear often in the quantum computing 
context, where no physical "wires" exist, and swapping two values requires non-trivial effort. 
fl4l . We will be working with circuits from a given, limited-gate library. Usually, this will be the 
CNTS gate library, consisting of the CNOT, NOT, and TOFFOLI, and SWAP gates. 

Definition 3 A well-formed reversible logic circuit is an acyclic combinational logic circuit in 
which all gates are reversible, and are interconnected without fanout. 

As with reversible gates, a reversible circuit has the same number of input and output wires; 
again we will call a reversible circuit with n inputs an nx n circuit, or a circuit on n wires. We 
draw reversible circuits as arrays of horizontal lines representing wires. Gates are represented 
by vertically-oriented symbols. For example, in Figure ^ we see a reversible circuit drawn in 
the notation introduced by Feynman (71 . The © symbols represent inverters and the • symbols 
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Figure 1:3x3 reversible circuit with two T gates and two N gates. 
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Figure 2: Truth table for the circuit in Figure [H 



represent controls. A vertical line connecting a control to an inverter means that the inverter is 
only applied if the wire on which the control is set carries a 1 signal. Thus, the gates used are, 
from left to right, TOFFOLI, NOT, TOFFOLI, and NOT 

Since we will be dealing only with bijective functions, i.e., permutations, we represent them 
using the cycle notation where a permutation is represented by disjoint cycles of variables. For 
example, the truth table in Figure|2is represented by (2, 3) (6, 7) because the corresponding func- 
tion swaps 010 (2) and Oil (3), and 110 (6) and 111 (7). The set of all permutations of n indices 
is denoted Sn, so the set of bijective functions with n binary inputs is S2". We will call (2,3)(6,7) 
CNT-constructible since it can be computed by a circuit with gates from the CNT gate library. 
More generally: 

Definition 4 Let L be a (reversible) gate library. An L-circuit is a circuit composed only of gates 
from L. A permutation % E 82'^ is L-constructible if it can be computed by annxn L-circuit. 

Figure |5Jz indicates that the circuit in Figure ^ is equivalent to one consisting of a single C 
gate. Pairs of circuits computing the same function are very useful, since we can substitute one 




(a) 



(b) 



Figure 3: Reversible circuit equivalences: (a) T^2'^^ ' '^12'^^ ~ (b) C| -C^ -C^ = 5^'^; subscripts 
identify "control bits" while superscripts identify bits whose values actually change. 
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Figure 4: Circuit C with n — k wires Y of temporary storage. 



for the other. On the right, we see similarly that three C gates can be used to replace the S gate 
appearing in the middle circuit of Figure|3t?. If allowed by the physical implementation, the S gate 
may itself be replaced with a wire swap. This, however, is not possible in some forms of quantum 
computation fT4ll . Figure |3l therefore shows us that the C and S gates in the CNTS gate library 
can be removed without losing computational power. We will still use the CNTS gate library in 
synthesis to reduce gate counts and potentially speed up synthesis. This is motivated by Figure|51 
which shows how to replace four gates with one C gate, and thus up to 12 gates with one S gate. 

Figurel^illustrates the meaning of "temporary storage" EHl . The top n — k lines transfer n — k 
signals, collectively designated Y, to the corresponding wires on the other side of the circuit. The 
signals Y are arbitrary, in the sense that the circuit K must assume nothing about them to make its 
computation. Therefore, the output on the bottom k wires must be only a function of their input 
values X and not of the "ancilla" bits Y, hence the bottom output is denoted f{X). While the 
signals Y must leave the circuit holding the same values they entered it with, their values may be 
changed during the computation as long as they are restored by the end. These wires usually serve 
as an essential workspace for computing f{X). An example of this can be found in Figure |3j?: the 
C gate on the right needs two wires, but if we simulate it with two N gates and two T gates, we 
need a third wire. The signal applied to the top wire emerges unaltered. 

Definition 5 Let Lbea reversible gate library. Then L is universal if for all k and all permutations 
K G S2k, there exists some I such that some L-constructible circuit computes 71 using I wires of 
temporary storage. 

The concept of universality differs in the reversible and irreversible cases in two important 
ways. First, we do not allow ourselves access to constant signals during the computation, and 
second, we synthesize whole permutations rather than just functions with one output bit. 



2.2 Prior Work 

It is a result of Toffoli's that the CNT gate library is universal; he also showed that one can bound 
the amount of temporary storage required to compute a permutation in S2" by « — 3. Indeed, 
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much of the reversible and quantum circuit literature allows the presence of polynomially many 
temporary storage bits for circuit synthesis. Given that qubits are a severely limited resource in 
current implementation technologies, this may not be a realistic assumption. We are therefore 
interested in trying to synthesize permutations using no extra storage. To illustrate the limitations 
this puts on the set of computable permutations, suppose we restrict ourselves to the C gate library. 
The following results are well-known in the quantum circuits literature flSl lSl. We provide proofs 
both for completeness, and to accustom the reader to techniques we will require later. 

Definition 6 A function / : {0, 1}" ^ {0, 1}™ is linear iff f{x®y) =/(x)©/(y), where © denotes 
bitwise XOR. 

This is just the usual definition of linearity where we think of {0, 1}" as a vector space over 
the two-element field F2. In our work n = m because of reversibility. Thus, / can be thought of 
as a square matrix over F2. The composition of two linear functions is a hnear function. 

Lemma 7 ^ Every C-constructible permutation computes an invertible linear transformation. 
Moreover, every invertible linear transformation is computable by a C-constructible circuit. No 
C-circuit requires more than n^ gates. 

Proof: To show that all C-circuits are linear, it suffices to prove that each C gate computes a 
linear transformation. Indeed, C(xi (£)yi,X2®y2) = {xi ®yi,xi (Byi ©X2©3'2) = {xi,xi ©ji) © 
{x2,X2 ^yi) = C{xi,yi)®C{x2,y2)- In the basis 10. . .0, 01 . . .0, . . ., 0. . .01, a C gate with the 
control on the i-th wire and the inverter on the 7-th applied to an arbitrary vector will add the 
i-th entry to the j-th. Thus, the matrices corresponding to individual C gates account for all the 
elementary row-addition matrices. Any invertible matrix in GL(¥2) can be written as a product 
of these. Thus, any invertible linear transformation can be computed by a C-circuit. Finally, any 
matrix over F2 may be row-reduced to the identity using fewer than n^ row operations. q 

One might ask how inefficient the row reduction algorithm is in synthesizing C-circuits. A 
counting argument can be used to find asymptotic lower bounds on the longest circuits Iil7i . 

Lemma 8 Let L be a gate library; let K„ C S2" be the set of L-constructible permutations on n 
wires, and let kj be the cardinality ofKj. Then the longest gate-minimal L-circuit on n wires has 
more than \ogkn/\ogb gates, where b is the number of one -gate circuits on n wires, b = poly{n), 
so for large n, worst-case circuits have length n(logfc„/log«). 

Proof: Suppose the longest gate-minimal L-circuit has x — l gates. Then every permutation in 
is computed by an L-circuit of at most x—\ gates. The number of such circuits is b' =< b^. 
Therefore, < fr^, and it follows that x > log log Z?. 
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Figure 5: Optimal C-circuits for C-constructible permutations on 2 wires. 

Finally, let G be a gate in L with the largest number of inputs, say p. Then, on n wires, there 
are at most n{n — I) . . . {n — p + I) <nP ways to make a 1-gate circuit using G. If L has q gates in 
total, then b < qnP = poly{n). Hence, x > log^„/(/7log?i + log^) = n(logfc„/log«). q 

We now need to count the number of C-constructible permutations. On two wires, there are 
six, corresponding to the six circuits in Figure |5l 

Corollary 9 4i7l/ S2" has YTi=o^" ~ 2') C-constructible permutations. Therefore, worst-case C- 
circuits require Q.ijP' /\ogn) gates. 

Proof: A linear mapping is fully defined by its values on basis vectors. There are 2" — 1 ways of 
mapping the 2" -bit string 10. ..0. Once we have fixed its image, there are 2" — 2 ways of mapping 
010. ..0, and so on. Each basis bit-string cannot map to the subspace spanned by the previous bit- 
strings. There are 2" — 2' choices for the /-th basis bit-string. Once all basis bit-strings are mapped, 
the mapping of the rest is specified by linearity. The number of C-constructible permutations on 
n wires is greater than 2"' /2. By Lemma[8l worst-case C-circuits require Q.{n^ /\ogn) gates, q 

Let us return to CNT-constructible permutations. A result similar to LemmaQrequires: 

Definition 10 A permutation is called even if it can be written as the product of an even number 
of transpositions. The set of even permutations in Sn is denoted A„. 

It is well-known that if a permutation can be written as the product of an even number of 
transpositions, then it may not be written as the product of an odd number of transpositions. 
Moreover, half the permutations in S„ are even for n > 1 . 

Lemma 11 4201/ Any nxn circuit with no nx n gates computes an even permutation. 
Proof: It suffices to prove this for a circuit consisting of only one gate, as the product of even 
permutations is even. Let G be a gate in an « x « circuit. By hypothesis, G is not nxn, so there 
must be at least one wire which is unaffected by G. Without loss of generality, let this be the 
high-order wire. Then 2"-'^®G{k) = G{T-^®k), and k < 2"-^ implies G{k) < 2"-^. Thus every 
cycle in the cycle decomposition of G appears in duplicate: once with numbers less than 2"^^, 
and once with the corresponding numbers with their high order bits set to one. But these cycles 
have the same length, and so their product is an even permutation. Therefore, G is the product of 
even permutations, and hence is even. r-i 
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To illustrate this result, consider the following example. A 2 x 2 circuit consisting of a single 
S gate performs the permutation (1 , 2), as the inputs 01 and 10 are interchanged, and the inputs 00 
and 1 1 remain fixed. This permutation consists of one transposition, and is therefore odd. On the 
other hand, in a 3 x 3 circuit, one can check that a swap gate on the bottom two wires performs 
the permutation (1,2) (5, 6), which is even. 

3 Theoretical Results 

Since the CNTS gate library contains no gates of size greater than three. Lemma ^2 implies that 
every CNTS-constructible (without temporary storage) permutation is even for n>4. The main 
result of this section is that the converse is also true. 

Theorem 12 Every even permutation is CNT-constructible. 

Before beginning the proof, we offer the following two corollaries. These give a way to syn- 
thesize circuits computing odd permutations using temporary storage, and also extend Theorem 
fT2lto an arbitrary universal gate library. 

Corollary 13 Every permutation, even or odd, may be computed in a CNT-circuit with at most 
one wire of temporary storage. 

Proof: Suppose we have an « x « gate G computing Ji E ^2", and we place it on the bottom n wires 
of an (n + 1 ) X (n + 1 ) reversible circuit; let % be the permutation computed by this new circuit. 
Then by Lemma fTTI % is even. By Theorem^] 71 is CNT-constructible. Let C be a CNT-circuit 
computing ft. C computes % with one line of temporary storage. q 

Corollary 14 For any universal gate library L and sufficiently large n, permutations in Aj" are 
L-constructible, and those in Sj" are realizable with at most one wire of temporary storage. 
Proof: Since L is universal, there is some number k such that we can compute the permutations 
corresponding to the NOT, CNOT, and TOFFOLI gates using a total of k wires. Let n> k, and let 
71 € A2". By Theorem El we can find a CNT-circuit C computing 7t, and can replace every N, C, 
or T gate with a circuit computing it. The second claim follows similarly from Theorem fT2l and 
Corollary El q 

To prove TheoremEl we begin by asking which permutations are C-constructible, N-constructible, 
and T-constructible. The first of these questions was answered in Section |2l We now summarize 
the properties of N-constructible permutations. In what follows, © denotes bitwise XOR. 

Definition 15 Given an integer i, we denote by N' the circuit formed by placing an N gate on 
every wire corresponding to a I in the binary expansion ofi. 
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Figure 6: Circuits A^' for i < 8. The superscript is interpreted as a binary 
number, whose non-zero bits correspond to the location of inverters. 



We will use N' to signify both the circuit described above, and the permutation which this 
circuit computes. Technically, the latter is not uniquely determined by the A'^' notation, but also 
depends on the number n of wires in the circuit; however, n will always be clear from context. 
The A'^' notation is illustrated for the case of three wires in Figure 15] 

Lemma 16 Let k £ Sj" be N-constructible. There exists an i such that 71 (x) = x © /. Moreover, the 
gate-minimal circuit for 71 is N\ There are 2" N-constructible permutations in Sj"- 
Proof: Clearly, computes the permutation 7i(x) = x© /. It now suffices to show that an arbitrary 
N-circuit may be reduced to one of the N' circuits. Any pair of consecutive N gates on the same 
wire may be removed without changing the permutation computed by the circuit. Applying this 
transformation until no more gates can be removed must leave a circuit with at most one N gate 
per wire; that is, a circuit of the form N\ □ 

3.1 T-Constructible Permutations 

Characterizing the T-constructible permutations is more difficult. We will begin by extending the 
A'^' notation defined above. 

Definition 17 Let A^'' be an N-circuit as defined above. Let k be an integer such that the bitwise 
Boolean product hk = 0. Let there be p Is in the binary expansion of h, and q in the binary 
expansion ofk. Define Nj^ to be the reversible circuit composed of p q-CNOT gates, with control 
bits on the wires specified by the binary expansion ofk, and inverters as specified by the binary 
expansion ofh. Nj^ performs N^ iff the wires specified by k have the value 1. 

In a 3 X 3 circuit, there are 3 possible T gates, namely nI,N^, and N^. They compute the per- 
mutations (6,7), (5,7), (3,7) respectively. By composing these three transpositions in all possible 
ways, we may form all 24 permutations of 3,5,6,7. These are precisely the non-negative integers 
less than 8 which are not of the form or 2'. Clearly, no T gate can affect an input with fewer 
than two Is in its binary expansion. 
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Lemma 18 Every T-circuit fixes and T for all i. 

For k xk T-circuits, ^ > 3, there is an added restriction. As T gates are 3x3, there can be no 
kx k gates in the circuit, so by Lemma the circuit must compute an even permutation. On the 
other hand, we will show that these are the only restrictions on T-constructible permutations. We 
will do this by choosing an arbitrary even permutation, and then giving an explicit construction 
of a circuit which computes it using no temporary storage. The first step is to decompose the 
permutation into a product of pairs of disjoint transpositions. 

Lemma 19 For n > 4, any even permutation in Sn may be written as the product of pairs of dis- 
joint transpositions. If a permutation K moves k indices, it may be decomposed into no more than 
pairs of transpositions. 

Proof: By a pair of disjoint transpositions, we mean something of the form {a,b){c,d) where 
a,b,c,d are distinct. For k>3, {xo,xi, . . . ,Xk) = {xQ,xi){xk-i,Xk){xo,X2,Xi, . . . ,Xk-i). Now 
(xo , xi ) {xk- 1 , Xfc) are disjoint, iteratively applying this decomposition process will convert an arbi- 
trary cycle into a product of pairs of disjoint transpositions possibly followed by a single transpo- 
sition, a 3-cycle or both. 

Consider an arbitrary permutation n = coC\ ...ct, where co---Ck are the disjoint cycles in 
its cycle decomposition. As shown above, we may rewrite this as 7i = Ki . . . K^Xi . . . XpOi ...Oq, 
where the k, are pairs of disjoint transpositions, the x, are transpositions, and the a/ are 3-cycles. 
As the Xi come from pairwise disjoint cycles, they must in turn be pairwise disjoint. Moreover, 
there must be an even number of them as % was assumed to be even, and the K, and a, are all 
even. Pairing up the x,- arbitrarily leaves an expression of the form Ki . . . K^_|_£ai . . . Oq. Again, the 
a,- are pairwise disjoint. Note that {a,b,c){d,e,f) = [{a,b){d,e)][{a,c){d,f)]; we may therefore 
rewrite any pair of disjoint 3-cycles as two pairs of disjoint transpositions. Iterating this process 
leaves at most one 3-cycle, {x,y,z). Since we are working in A„ for n > 4, there are at least two 
other indices, v,w. Using these, we have {x,y,z) = [{x,y){v,w)][{v,w){x,z)]. 

A careful count of transposition pairs gives the bound in the statement of the lemma. This 
bound is tight in the case of a permutation consisting of a single 4n + 3 cycle. q 

By Lemma it suffices to show that we may construct a circuit for an arbitrary disjoint 
transposition pair. We begin with an important special case. On n wires, a N2k_^ gate computes 
the permutation Kq = (2" — 4, 2" — 3)(2" — 2, 2" — 1), which may be implemented by S{n — 5) T 
gates Q] Corollary 7.4]. 

Lemma 20 On n wires, the permutation Kq = (2" — 4, 2" — 3) (2" — 2, 2" — 1 ) is T-constructible. 

Consider now an arbitrary disjoint transposition pair, K = {a,b){c,d). Given a permutation n 
with the property n{a) =2" -4, n{b) = 2" -3, 7i(c) =2" -2, n{d) =2" -I, we have K = tikoTI" ^ 
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where Kq is the permutation in Lemma|20l We have a circuit which computes Kq. Given a circuit 
that computes n, we may obtain a circuit computing 71^' by reversing it. We now construct a 
circuit computing 71. 

Lemma 21 Suppose n> 3, and < a,b,c,d < 2". Further suppose that none ofa,b,c,d is 0, or 
of the form 2'. Then there exists a T-constructible permutation % with the property 7l(a) = 2" — 1, 
%{b) = 2" — 2, 7i(c) = 2" — 3, 7i((i) = 2" — 4, computable by a circuit of no more than 5n — 2T 
gates. 

Proof: To simphfy notation, set M = 2"^^ and m = n — I. Now, we construct n in five stages. 
First, we build a permutation tIq such that tIq (a) = M +4. Then, we build lib such that lib o (b) = 
M + \, and Hb{M + 4) =M + 4. Similarly, 71^. will fix M + 1 and M + 4, while 71^ o 71^ o ti^ (c) = 
M + 2, and 71^ will fixM+l,M + 2, M + 4 while 71^/ o 71^ o 71^ o 71^ (<i) = M + 7. Finally, we build 
a circuit that maps M + 4h^2M-4, M+1h^2M-3,M + 2h^2M-2, andM + 7H^2M-l. 

By hypothesis, a is not or of the form 2'. This means that a has at least two Is in its binary 
expansion, say in positions ha and ka. Apply T gates with controls on positions ha and ka to set 
the second and mth bits. More precisely, let Za = 2^'" +2*^", apply a iff a has a in the (« — l)st 
bit and N^^ iff a has a in the 2nd bit. Now, apply T gates with the controls on the mth and 2nd 
bits to set the remaining bits to 0. Let Ka be the permutation computed by the circuit given above. 

Kaib) must again have two nonzero bits in its binary expansion; since b a implies na{b) ^ 
Ka{a), some nonzero bit of Ha{b) lies on neither the mth nor the 2nd wire. ControlUng by this and 
another bit, use the techniques of the previous paragraph to build a circuit taking Ha{b) — > M + 1 . 
By construction, this fixes M + 4; let the permutation computed by this circuit be 71^. 

Consider now the nonzero bits of c' = 7i/, o7ia(c). Again, since a,b^c,sN& haveM+4,M+ 1 7^ 
c'. Therefore, there must be at least one bit in which c' differs from M + 4. This bit could be the 
mth or the second bit, and c' could have a zero in this position. However, as c' is guaranteed 
to have at least 2 non-zero bits, there must be some other bit which is 1 in c' and in M + 4. 
Similarly, there must be some bit which is 1 in c' and in M + 1. Controlling by these two bits 
(or, if they are the same bit, by this bit and any other bit which is 1 in c'), we may use the above 
method to set c' ^ M + 2. 

Next, consider the nonzero bits of d' = %c °'^h °'^a{d)- First, suppose there are two which 
are not on the mth wire. Controlling by these can take d' M + 7 without affecting any of the 
other values, as none ofM+l,M + 2,M + 4 have Is in both these positions. If there are no two 
Is in the binary expansion of d' which both lie off the mth wire, there can be at most two Is in 
the binary expansion, one of which lies on the mth wire. Since a,b,c ^ d, the second must lie on 
some wire which is not the 0th, 1st, or 2nd; in this case we may again control by these two bits to 
take J' — > M + 7 without affecting other values. 
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Finally, apply N^^^ and N^_^2 g^tes, and then a Njl^^l circuit. The reader may verify that this 
completes stage 5. Each of the first 4 stages takes at most n T gates, as we flip at most n bits in 
each. The final stage uses exactly n — lT gates. q 

We now have a key result to prove. 

Theorem 22 Every T-constructible permutation in S2" fixes and 2' for all i, and is even ifn> 
3. Conversely, every permutation of this form is T-constructible. A T-constructible permutation 
which moves s indices requires at most l>{s + l)(3?i — 1)T gates. There are ^{2" — « — 1)! T- 
constructible permutations in S2". 

Proof: We have already dealt with the case n = 3; hence suppose n > 3. The first statement 
follows directly from Lemmas fTTI and [T^ Now let n G S2" be an arbitrary even permutation fixing 
0, 2'. Use the method of Lemma ^] to decompose 71 into pairs of disjoint transpositions which 
fix 0, 2'. We are justified in using Lemma [TOlbecause. for n> 3, there are at least five numbers 
between and 2"^^ which are not of the form or 2'. Finally, using the circuits implied by 
Lemmas |20l and |^ we may construct circuits for each of these transposition pairs. Chaining 
these circuits together gives a circuit for the permutation 71. Collecting the length bounds of the 
various lemmas cited gives the length bound in the theorem. The final claim then follows. q 

3.2 Circuit Equivalences 

Given a (possibly long) reversible circuit to perform a specified task, one approach to reducing 
the circuit size is to perform local optimizations using circuit equivalences. The idea is to find 
subcircuits amenable to reduction. This direction is pursued in a paper by Iwama et al. 0, which 
examines circuit transformation rules for generalized-CNOT circuits which only alter one bit of 
the circuit. In their scenario, other bits may be altered during computation, so long as they are 
returned to their initial state by the end of the computation. We present a more general framework 
for deriving equivalences, from which many of the equivalences from |H| follow as special cases. 
First, let us introduce notation to better deal with control bits. 

Definition 23 Let G' be a reversible gate that only affects wires corresponding to the Is in the 
binary expansion of i (as in an N' gate). Let the bitwise Boolean product i - j = 0. Then define 
Vj{G') as the gate which computes G' iff the wires specified by j all carry a 1. 

In particular, Vj{N^) = Nj, and VkVj{G^) = Vk+j{G'). Addition, multiphcation, etc., of lower 
indices will always be taken to be bitwise Boolean, with +, •, © representing OR, AND, and XOR 
respectively. We denote the bitwise complement of x as x. 
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Lemma 24 Let K be annxn reversible circuit such that K{Ox\ . . . x„_ i ) = (Oxi . . . x„_ i ), and let 
f : B"^^ B"^^ be the function defined by K{\x\ . . .x„_i) = (l/(xi . . .x„_i)). Then f is a well- 
defined permutation in 5'2«-i, and iff is a circuit computing f, then Vi (F) = K. 
Proof: K, by hypothesis, permutes the inputs with a leading amongst themselves. By reversibil- 
ity, it must permute inputs with a leading 1 amongst themselves as well. q 

Definition 25 The commutator of permutations P and Q, denoted [P, Q], is PQP^Q^- 

The commutator concept is useful for moving gates past each other since PQ = [P, Q] QP. 
Moreover, it has reasonable properties with respect to control bits as the following result indicates. 

Corollary 26 [n(GO,n-(//^-)] = V(,^,).(^([y,,XGO, 

Proof: The corollary provides a circuit equivalent to the commutator of two given gates with 
arbitrary control bits. Namely, such a circuit can be constructed in two steps. First, identify wires 
which act as control for one gate but are not touched by the other gate. Second, connect the latter 
gate to every such wire so that the wire controls the gate. 

By induction, it suffices to show that this procedure can be done to one such wire. Without loss 
of generality, suppose control bits and only control bits appear on the first wire. Then the input 
to this wire goes through the circuit unchanged. At least one of the two gates whose commutator 
is being computed must, by hypothesis, be controlled by the first wire. Therefore, on an input of 
zero to the first wire, this gate (and therefore its inverse) leaves all signals unchanged. Since the 
other gate appears along with its inverse, the whole circuit leaves the input unchanged. Our result 
now follows from Lemma|53] q 

If we are computing the commutator of generalized CNOT gates, then we may pick G' to 
be single inverters N\N^ with ij having only a single 1 apiece in their binary expansions. Then 
we must have h- j = or j, and ^ • / = or /. The four cases are accounted for as follows: 
Lemma 27 Let i,j have only a single 1 apiece in their binary expansions. Then [N',NI] = N^, 
[N'j,Nj] =N\ [N\Ni] = I, and [N),nI] =Nj. 

Proof: As these equivalences all involve only 2-bit circuits, we may check them for / = 0, 7 = 1 
by evaluating both sides of each equivalence on each of 4 inputs. q 

3.3 CT I N and C I T Constructible Permutations 

While an arbitrary CNT-circuit may have the C, N, and T gates interspersed arbitrarily, we first 
consider circuits in which these gates are segregated by type. 

Definition 28 For any gate libraries L\...Lk, a L\\... \Lk-circuit is an L\-circuit followed by 
an L2-circuit, . . . , followed by an Lk-circuit. A permutation computed by an Li \ . . . \Lk-circuit is 
L\\... \Lk-constructible. 
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Figure 7: Equivalences between reversible circuits used in our constructions. 



A CNT-circuit with all N gates appearing at the right end is called a CT|N circuit. 

Theorem 29 Let n be CNT-constructible. Then K is also CT\N-constructible. Moreover, 71 uniquely 
determines the permutations %CT cif^d Tip] computed by the CT and N sub-circuits, respectively. 
Proof: We move all the N gates toward the outputs of the circuit. Each box in Figure0J indicates 
a way of replacing an N|CT circuit with a CT|N circuit. The equivalences in this figure come from 
Corollary Moreover, every possible way for an N gate to appear to the immediate left of a 
C or a T is accounted for, up to permuting the input and output wires. Now, number the non-N 
gates in the circuit in a reverse topological order starting from the outputs. In particular, if two 
gates appear at the same level in a circuit diagram, they must be independent, and one can order 
them arbitrarily. Let d be the number of the highest-numbered gate with an N gate to its imme- 
diate left. All N gates past the d-th gate G can be reordered with the G gate without introducing 
new N gates on the other side of G, and without introducing new gates between the N gates and 
the outputs. In any event, as there are no remaining N gates to the left of G, d decreases. This 
process terminates with all the N gates are clustered together at the circuit outputs. If we always 
cancel redundant pairs of N gates, then no more than two new gates will be introduced for each 
non-inverter originally in the circuit; additionally, there will be at most n N gates when the process 
is complete. Thus if the original circuit had I gates, then the new circuit has at most 3(/ — 1) 
gates. Note that C and T gates (and hence CT-circuits) fix 0. Thus 71 (0) = tin{0), so = N'^^^\ 
and;icr = 7tA^"^°). □ 

Thus, if we want a CNT-circuit computing a permutation 7i, we can quickly compute and 
then simplify the problem to that of finding a CT-circuit for 7I7Ia?. By Theorem l29l we know that 
a minimal-gate circuit of this form has roughly three times as many gates as the gate-minimal 
circuit computing 7i. 

The next natural question is whether an arbitrary CT-circuit is equivalent to some T|C circuit. 
The equivalences in FigureEt suggest that the answer is yes. However, the proof of Theorem l29l 
requires that many N gates be able to simultaneously move past a C or T gate, while FigureElonly 
shows how to move a single C gate past a single T gate. 
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Lemma 30 The permutation % computed by a T\ C-circuit determines the permutations Tij and 
Kc computed by the sub-circuits. An even permutation is TC-constructible iff it fixes and the 
images of inputs of the form 2' are linearly independent over ¥2. 

Proof: Let n be an arbitrary permutation. If % is T|C-constructible, then images of the inputs 
2' are unaffected by the T subcircuit; by LemmaQthey must be mapped to hnearly independent 
values by the C subcircuit. This mapping of basis vectors completely specifies the permutation 
He computed by the C subcircuit, and therefore also the permutation 7i, = nn^^ computed by 
the T subcircuit. Conversely, suppose n is even and fixes 0, and the images of 2' are linearly 
independent. Then there is some C-circuit taking the values 2' to their images under n. Let it 
compute the permutation 71^; then 7171^^ fixes the values and 2' by construction. Theorem HT] 
therefore guarantees that tcti^ ' is T-constructible. q 

We will later use this result to show the existence of CT-constructible permutations which are 
not T|C constructible. 

3.4 T I C I T I N-Constructible Permutations 

We are now ready to prove TheoremEl According to Lemma|30l zero-fixing even permutations 
are T|C-constructible if they map inputs of the form 2' in a certain way. This suggests that T|C- 
circuits account for a relatively large fraction of such permutations. 

Theorem 31 Every zero-fixing permutation in S23 and every zero-fixing even permutation in S2" 
for n > 4 is T\ C\ T-constructible, and hence is CT-constructible. None requires more than n^ C 
gates and 3 (2" + « + 1 ) (3n — 1)T gates. 

Proof: Let % be any zero-fixing permutation. Note that if the images of 2' under % were linearly 
independent. Lemma l30lwould imply that % was T|C constructible. So, we will build a permuta- 
tion %T with the property that the images of 2' under rnij are linearly independent, ensuring that 
■K%T is T|C-constructible. Given a T|C-circuit for tztit and a T-circuit for TZj, we can reverse the 
circuit for 71^ and append it to the end of the T|C-circuit for 71717- to give at T|C|T-circuit for 71. All 
that remains is to show we can build one such 717-. 

The basis vectors 2' must be mapped either to themselves, to other basis vectors, or to vectors 
with at least two Is. Let i\ .. . i^ be the indices of basis vectors which are not the images of other 
basis vectors, and let ji ...jk be the indices of basis vectors whose images have at least two Is. 
Let ii . ..i„-k and j'l . . .jn-k be the indices which are not in the /,„ and respectively. Consider 
the matrix M-^ in which the /th column is the binary expansion of 7i(2'). We take the entries 
of Mji to be elements of F2. Our indexing system divides into four submatrices; Mji{i,j), 
M,i(/,/), Mji{i,j), and M-^{i,]). By construction, and Mn(/,/) are square, Mji{i,J) is a 

permutation matrix, and M;t(/,j) is a zero matrix. Therefore, detM,i = detM;;i(/, j), and M^t is 
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invertible iff Mn(/,7) is. Moreover, there is an invertible linear transformation, computable by 
column-reduction, which zeroes out the matrix Mn(/,j) without affecting MTi{i,j) or M;t(/,j). As 
this transformation L is invertible, it corresponds to a permutation Hx, and the matrix ML is the 
matrix of images of 2' under the permutation HxK. In particular, the columns of (ML)-^ must all be 
different, which implies that the columns of M-^{i,j) must all be different. Moreover, 71,; is linear, 
and therefore zero-fixing; hence M-^{i,i) can have no zero columns. Taken together, these facts 
imply that for = 1,2, M-^{iJ) is invertible, hence so is M^j, thus 7i is T|C-constructible. 

Suppose ^ > 3, and consider the family of matrices A{p) defined as follows. A{p) i?, n px p 
matrix with Is on the diagonal. Is in the first row, and Is in the first column, except possibly in 
the (1,1) entry, which is 1 iff p is odd. Row -reducing the A,- to lower triangular matrices quickly 
shows that the A,- are invertible for all /. Moreover, for / > 3, there is at least two Is in every 
column. Therefore, there is a T-constructible permutation Tij such thatM;i;ij,(/,7) =Ak. Thus titit- 
is T|C-constructible, and % is T|C|T constructible. 

Finally, we know from Corollary |9l that no more than gates are necessary to compute Tie. 
At most 2n indices need be moved by TZj, and no more than 2" — « — 1 can be moved by the 
T-constructible part of 71. Thus by Theorem |22j we need no more than 3 {2n + 1 ) (3?i — 7 ) gates for 
%T and no more than 3(2" — n){3n — 7) gates for %. Adding these gives the gate-count estimate 
above. q 

Corollary 32 There exist T\C\T-constructible permutations which are not T\C-constructible. 
Proof: The permutation 7i = (2, 6) (4, 7) fixes and is even, hence is T|C|T-constructible in S2" 
for all « > 3 by Theorem 13X1 However, 7i(l) © 7l(2) = 1 © 6 = 7 = 7l(4), hence by Lemma l30l % 
is not T|C-constructible. □ 

Theorem 33 Every permutation in S2" for n = 1,2,3 and every even permutation in S2" for n> 3 
is T\C\T\N-constructible, and hence CNT-constructible. None requires more than n^ C gates, n N 
gates, and 3(2" + n+ l){3n — 7) T gates. 

Proof: Let 7i be any permutation; then n' = nN''^^'^^ fixes 0. For n = l,n' must be the identity; for 
n = 2 n' permutes 1,2,3, any such permutation is linear, hence n' is C-constructible. For n = 3, 
n' is T|C|T-constructible; for n> 3,n' is T|C|T-constructible iff it is even, which happens iff 71 is 
even. Thus in all cases there is a T|C|T-circuit, n' computing 71'; then nW*^"' is a T|C|T|N-circuit 
computing 7i. q 

We note that the size of a truth table for a circuit with n inputs and n outputs is n2" bits. The 
synthesis procedure used in the theorems above clearly runs in time proportional to the number of 
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gates in the final circuit. This is 0(«2"), hence the synthesis procedure detailed in the theorems 
has linear runtime in the input size. 

Just as in Corollary |^ we may ask how far from optimal the foregoing construction is for long 
circuits. There are 2'M/2 even permutations in S2", and these are all CNT-constructible. Using 
Stirling's approximation, log(/c!) « klogk, and Lemma [8l gives: 

Corollary 34 Worst case CNT-circuits on n wires require Q.{n2" / logn) gates. 

So, for long CNT-circuits, the algorithm implied by Theoreml33]is asymptotically suboptimal 
by, at worst, a logarithmic factor, as it produces circuits of length 0{n2"). This is remarkably 
similar to the result of Corollary |51 in which we found that using row reduction to build C-circuits 
is asymptotically suboptimal by a logarithmic factor in the case of long C-circuits. However, even 
a constant improvement in size is very desirable, and circuits for practical applications are almost 
never of the worst-case type considered in Corollaries l9l and l34l 
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4 Optimal Synthesis 



We will now switch focus, and seek optimal realizations for permutations we know to be CNT- 
constructible. A circuit is optimal if no equivalent circuit has smaller cost; in our case, the cost 
function will be the number of gates in the circuit. 

Lemma 35 (Property of Optimality) IfB is a sub-circuit of an optimal circuit A, then B is optimal. 

Proof: Suppose not. Then let B' be a circuit with fewer gates than B, but computing the same 
function. If we replace Bby B', we get another circuit A' which computes the same function as A. 
But since we have only modified B, A' must be as much smaller than A as B' is smaller than B. A 
was assumed to be optimal, hence this is a contradiction. (Note that equivalent, optimal circuits 
can have the same number of gates.) |--| 

The algorithm detailed in this section relies entirely on the property of optimality for its cor- 
rectness. Therefore, any cost function for which this property holds may, in principle, be used 
instead of gate count. 

Lemma |35l allows us to build a library of small optimal circuits by dynamic programming 
because the first m gates of an optimal {m+ l)-gate circuit form an optimal subcircuit. Therefore, 
to examine all optimal {m+ l)-gate circuits, we iterate through optimal wi-gate circuits and add 
single gates at the end in all possible ways. We then check the resulting circuits against the library, 
and eliminate any which are equivalent to a smaller circuit. In fact, instead of storing a library 
of all optimal circuits, we store one optimal circuit per synthesized permutation and also store 
optimal circuits of a given size together. 

One way to find an optimal circuit for a given permutation n is to generate all optimal ^-gate 
circuits for increasing values of k until a circuit computing n is found. This procedure requires 
0(2" !) memory in the worst case (n is the number of wires) and may require more memory than 
is available. Therefore, we stop growing the circuit library at m-gate circuits, when hardware 
limitations become an issue. The second stage of the algorithm uses the computed library of 
optimal circuits and, in our implementation, starts by reading the library from a file. Since little 
additional memory is available, we trade off runtime for memory. 

We use a technique known as depth-first search with iterative deepening (DFID) |10|. After 
a given permutation is checked against the circuit library, we seek circuits with j = m + 1 gates 
that implement this permutation. If none are found, we seek circuits with 7 = m + 2 gates, etc. 
This algorithm, in general, needs an additional termination condition to prevent infinite looping 
for inputs which cannot be synthesized with a given gate library. For each j, we consider all 
permutations optimally synthesizable in m gates. For each such permutation p, we multiply n 



19 



CIRCUIT f ind_circ (COST, PERM) 
// assumes circuit library stored in LIB 

if (COST < k) 

// If PERM can be computed by a circuit with < k gates, 
// such a circuit must be in the library 

return LIB [DEPTH] . find (PERM) 

else 

// Try building the goal circuit from <k-gate circuits 

for each C in LIB[k] 

// Divide PERM by permutation computed by C 

PERM2 ^ PERM * INVERSE ( C . perm) 

// and try to synthesize the result 

TEMP_CCT ^ f ind_circ (depth-k, PERM2) 

if (TEMP_CCT != NIL) return TEMP_CCT * C 

// Finally, if no circuit of the desired depth can be found 

return NIL 



Figure 8: Finding a circuit of cost <COST that computes permutation PERM 
(NIL returned if no such circuit exists). TEMP_CCT and records in LIB represent 
circuits, and include a field "perm" storing the permutation computed. The * char- 
acter means both multiplication of permutations and concatenation of circuits, and 
NIL* < anything > =NIL . 

by p^' and recursively try to synthesize the result using j — m gates. When j — m < m, this 
can be done by checking against the existing library. Otherwise, the recursion depth increases. 
Pseudocode for this stage of our algorithm is given in Figure |8] 
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Table 1 : Number of permutations computable in an optimal L-circuit using a given num- 
ber of gates. L C CNTS. Runtimes are in seconds for a 2GHz Pentium-4 Xeon CPU. 

In addition to being more memory-efficient than straightforward dynamic programming, our 
algorithm is faster than branching over all possible circuits. To quantify these improvements, 
consider a library of circuits of size m or less, containing 1^ circuits of size m. We analyze the 
efficiency of the algorithms discussed by simulating them on an input permutation of cost k. Our 

{k—\)/m I 

algorithm requires references to the circuit library. Simple branching is no better than 

our algorithm with m = 1, and thus takes at least l\ steps, which is l'[/l\ri^ '^^"^ times more than 
our algorithm. A speed-up can be expected because /,„ < If, but specific numerical values of 
that expression depend on the numbers of suboptimal and redundant optimal circuits of length 
m. Indeed, Table ^ lists values of 1^ for various subsets of the CNTS gate library and m = 3. 
For example, for the NT gate library, k = \2, \(k — \) /m\ = 2>, l\ = 6 and 1^ = 88. Therefore 
the performance ratio is = 6^788^ « 3194.2. Yet, this comparison is incomplete 

because it does not account for time spent building circuit libraries. We point out that this charge 
is amortized over multiple synthesis operations. In our experiments, generating a circuit library 
on three wires of up to three gates (m = 3) from the CNTS gate library takes less than a minute 
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on a 2-GHz Pentium-4 Xeon. Using such libraries, all of Tabled can be generated in minutes/ 
but it cannot be generated even in several hours using branching. 

Let us now see what additional information we can glean from Table [2 Adding the C gate to 
the NT library appears to significantly reduce circuit size, but further adding the S gate does not 
help as much. To illustrate this, we show sample worst-case circuits on three wires for the NT, 
CNT, and CNTS gate hbraries in Figure |9l 

The totals in Tabled can be independently determined by the following arguments. Every 
reversible function on three wires can be synthesized using the CNT gate library ll20l . and there 
are 8! = 40,320 of these. All can be synthesized with the NT library because the C gate is 
redundant in the CNT library; see Figure On the other hand, adding the S gate to the library 
cannot decrease the number of synthesizable functions. Therefore, the totals in the NT and CNTS 
columns must be 40, 320 as well. On the other side of the table, the number of possible N circuits is 
just 2^ = 8 since there are three wires, and there can be at most one N gate per wire in an optimal 
circuit (else we can cancel redundant pairs.) By Theorem l29l the number of CN-constructible 
permutations should be the product of the number of N-constructible permutations and the number 
of C constructible permutations, since any CN-constructible permutation can be written uniquely 
as a product of an N-constructible and a C-constructible permutation. So the total in the CN 
column should be the product of the totals in the C and N columns, which it is. Similarly, the total 
in the CNT column should be the product of the totals in the CT and N columns; this allows one 
to deduce the total number of CT-constructible permutations from values we know. Finally, we 
showed that there were 24 T-constructible permutations on 3 wires in Section |3l and Corollary |9l 
states that the number of permutations implementable on n wires with C gates is n"=J (2" - 2'). 
For n = 3 this yields 168 and agrees with Tabled 

We can also add to the discussion of T|C constructible circuits we began in Section |3l By 
Lemma 1^ the number of T| C-constructible permutations can be computed as the product of 
the numbers of T-constructible and C-constructible permutations. Tabled mentions 24 T-circuits 
and 168 C-circuits on three wires. The product, 4032, is less than 5040, the number of CT 
constructible permutations on three wires, as we would expect from Corollary 1^ 

Finally, the longest C-circuits we observed on 3, 4 and 5 wires merely permute the wires. Such 
wire-permutations on n wires never require more than 3(« — 1) gates. However, from Corollary!^ 
we know that for large n, worst-case C-circuits require Q.{n^ /log{n)) gates. Identifying specific 
worst-case circuits and describing families with worst-case asymptotics remains a challenge. 

'Although complete statistics for all 16! 4-wire functions are beyond our reach, average synthesis times are less than 
one second when the input function can be implemented with eight gates or fewer. Functions requiring nine or more gates 
tend to take more than 1.5 hours to synthesize. In this case, memory constraints limit our circuit hbrary to 4-gate circuits, 
and the large jump in runtime after the 8-gate mark is due to an extra level of recursion. 
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Figure 9: Worst-case L-circuits where L is NT, CNT and CNTS. 

Finally, we note that while the exact runtime complexity of this algorithm is dependant on 
characteristics of the gate library chosen, for a complete gate library it is obviously exponential in 
the number of input wires to the circuit (this is guaranteed by Corollary and in fact must be at 
least doubly-exponential in the number of input wires (that is, exponential in the size of the truth 
table). Scalability issues, therefore, restrict this approach to small problems. On the other hand, 
given that the state of the art in quantum computing is largely limited by ten qubits, such small 
circuits are of interest to physicists building quantum computing devices. 

5 Quantum Search Applications 

Quantum computation is necessarily reversible, and quantum circuits generalize their reversible 
counterparts in the classical domain llHI . Instead of wires, information is stored on qubits, whose 
states we write as |0) and 1 1) instead of and 1. There is an added complexity — a qubit can be in 
a superposition state that combines |0) and |1). Specifically, |0) and |1) are thought of as vectors 
of the computational basis, and the value of a qubit can be any unit vector in the space they span. 
The scenario is similar when considering many qubits at once: the possible configurations of the 
corresponding classical system (bit-strings) are now the computational basis, and any unit vector 
in the linear space they span is a valid configuration of the quantum system. Just as the classical 
configurations of the circuit persist as basis vectors of the space of quantum configurations, so 
too classical reversible gates persist in the quantum context. Non-classical gates are allowed, in 
fact, any (invertible) norm-preserving linear operator is allowed as a quantum gate. However, 
quantum gate libraries often have very few non-classical gates flUl . An important example of a 
non-classical gate (and the only one used in this paper) is the Hadamard gate H. It operates on 
one qubit, and is defined as follows: H\0) = ^(|0) + |1)), and H\\) = ^(|0) - |1)). Note that 
because H is linear, giving the images of the computational basis elements defines it completely. 

During the course of a computation, the quantum state can be any unit vector in the linear space 
spanned by the computational basis. However, a serious limitation is imposed by quantum mea- 
surement, performed after a quantum circuit is executed. A measurement non-deterministically 
collapses the state onto some vector in a basis corresponding to the measurement being performed. 
The probabilities of outcomes depend on the measured state — basis vectors [nearly] orthogonal 
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Figure 10: A high-level schematic of Grover's search algorithm. 



to the measured state are least likely to appear as outcomes of measurement. If //|0) were mea- 
sured in the computational basis, it would be seen as |0) half the time, and 1 1) the other half. 

Despite this limitation, quantum circuits have significantly more computational power than 
classical circuits. In this work, we consider Grover's search algorithm, which is faster than any 
known non-quantum algorithm for the same problem (61. Figurenoloutlines a possible implemen- 
tation of Grover's algorithm. It begins by creating a balanced superposition of 2" n-qubit states 
which correspond to the indexes of the items being searched. These index states are then repeat- 
edly transformed using a Grover operator circuit, which incorporates the search criteria in the 
form of a search-specific predicate f{x). This circuit systematically amplifies the search indexes 
that satisfy f{x) = 1 until a final measurement identifies them with high probability. 

A key component of the Grover operator is a so-called "oracle" circuit that implements a 
search-specific predicate f{x). This circuit transforms an arbitrary basis state \x) to the state 
\^x). The oracle is followed by (i) several Hadamard gates, (ii) a subcircuit which flips the 
sign on all computational basis states other than |0), and (iii) more Hadamard gates. A sample 
Grover-operator circuit for a search on 2 qubits is shown in Figure and uses one qubit of 
temporary storage fl4l . The search space here is {0,1,2,3}, and the desired indices are and 
3. The oracle circuit is highlighted by a dashed line. While the portion following the oracle 
is fixed, the oracle may vary depending on the search criterion. Unfortunately, most works on 
Grover's algorithm do not address the synthesis of oracle circuits and their complexity. According 
to Bettelli et al. |?|, this is a major obstacle for automatic compilation of high-level quantum 
programs, and little help is available. 

Lemma 36 M4\l With one temporary storage qubit, the problem of synthesizing a quantum circuit 
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Figure 11: A Grover-operator circuit with oracle highlighted. 



that transforms computational basis states \x) to {—\y'^^^\x) can be reduced to a problem in the 
synthesis of classical reversible circuits. 

Proof: Define the permutation Tif by ■Kf{x,y) = {x,y ® f{x)), and define a unitary operator 
Ufhy letting it permute the states of the computational basis according to %f. The additional 
qubit is initialized to |— ) = H\\) so that Uf\x^—) = (— 1)-^W |x, — ). If we now ignore the value 
of the last qubit, the system is in the state (— l)-^W|x), which is exactly the state needed for 
Grover's algorithm. Since a quantum operator is completely determined by its behavior on a 
given computational basis, any circuit implementing 71/ implements Uf. As reversible gates may 
be implemented with quantum technology, we can synthesize as a reversible logic circuit, q 

Quantum computers implemented so far are severely limited by the number of simultaneously 
available qubits. While n qubits are necessary for Grover's algorithm, one should try to minimize 
the number of additional temporary storage qubits. One such qubit is required by Lemma l36l to 
allow classical reversible circuits to alter the phase of quantum states. 

Corollary 37 For permutations 7iy (x,y) = {x,y ® f[x)), such that {x : /(x) = 1} has even car- 
dinality, no more temporary storage is necessary. For the remaining n /, we need an additional 
qubit of temporary storage. 

Proof: The permutation 71/ swaps {x,y) with (x,j©/(x)), and therefore performs one trans- 
position for each element of {x : /(x) = 1}. It is therefore even exactly when this set has even 
cardinality. The lemma follows from Corollary |--| 

Given 7i/, we can use the algorithm of Section|4]to construct an optimal circuit for it. Table 
121 gives the optimal circuit sizes of functions 71/ corresponding to 3-input 1 -output functions / 
("3-1-1 oracles") which can be synthesized on four wires. These circuits are significantly smaller 
than many optimal circuits on four wires. This is not surprising, as they perform less computation. 

In Grover oracle circuits, the main input lines preserve their input values and only the tem- 
porary storage lines can change their values. Therefore, Travaglione et al. fTl\ studied circuits 
where some lines cannot be changed even at intermediate stages of computation. In their termi- 
nology, a circuit with k lines that we are allowed to modify and an arbitrary number of read-only 
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Circuit Size 





1 


2 


3 


4 


5 


6 


7 


Total 


No. of circuits 


1 


7 


21 


35 


35 


24 


4 


1 


128 



Table 2: Optimal 3+1 oracle circuits for Grover's search. 



lines is called a k-hit ROM-based circuit. They show how to compute permutation Tif arising 
from a Boolean function / using a 1-bit quantum ROM-based circuit, and prove that if only clas- 
sical gates are allowed, two writable bits are necessary. Two bits are sufficient if the CNT gate 
library is used. The synthesis algorithms of Travaglione et al. I2T1 rely on XOR sum-of -products 
decompositions of /. We outline their method in a proof of the following result. 

Lemma 38 H21]l There exists a reversible 2-bit ROM-based CNT-circuit computing (x,a,b) 
{x,a,b(B f{x)), where x is a k-bit input. If a function's XOR decomposition consists of only one 
term, let k be the number of literals appearing ( without complementation ). Ifk > then 3 • 2^^ ^ — 2 
gates are required. 

Proof: Assume we are given an XOR sum-of-products decomposition of /. Then it suffices to 
know how to transform {x, a, b) {x,a,b(B p) for an arbitrary product of uncomplemented literals 
p, because then we can add the terms in an XOR decomposition term by term. So, without loss 
of generality, let p = x\ . . .Xm- Denote by T{a,b;c) a T gate with controls ona,b and inverter on 
c. Similarly, denote by C{a;b) a C gate with control on a and inverter on b. Number the ROM 
wires I ...k, and the non-ROM wires k + l and k-\-2. Let us first suppose that there is at least 
one uncomplemented literal, and put aC(l;fc + 2) on the circuit; note that C(l;/c + 2) applied to 
the input {x,a,b) gives {x,a,b(Bxi). We will write this as C{l;k + 2) : {x,a,b) {x,a,bQ)xi), 
and denote this operation by Wi. Then, we define the circuit W2 as the sequence of gates r(2,^ + 
2;k+ l)WoT{2,k + 2;k-\- 1)Wq, and one can check that '■ {x,a,b) {x,a®xiX2,b). We define 
W2 by exchanging the wires k-\-l and k + 2; clearly W2 : {x,a,b) {x,a,b®x\X2). In general, 
given a circuit Wi : {x,a,b(Bxi .. .x/_i) — > {x,a®xi .. .xi), we define W/_^j = T{1 + l,fc + 2;k-\- 
\)WiT {I -\- \ ,k + 2;k -\- \)Wi; one can check that W^'^j : {x,a,b) {x,a®x\ . . .xij^\,b). Define 
W/+1 by exchanging the wires k+\ and ^ + 2; then clearly W/+i : {x,a,b) {x,a,bQxi .. .xi+i). 
By induction, we can get as many uncomplemented literals in this product as we like. q 

The heuristic presented above has the property that none of its gates has more than one control 
bit on a ROM bit. Indeed, Travaglione et al. II2TI had restricted their attention to circuits with 
precisely this property. However, they note fTTI that their results do not depend on this restriction. 

We applied the construction of Lemma to all 256 functions implementable in 2-bit ROM- 
based circuits with 3 bits of ROM. The circuit size distribution is given in the line labeled XOR 
in Table |5] In comparing with circuits lengths resulting from our synthesis algorithm of Section 
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Table 3: Circuit size distribution of 3+2 ROM-based circuits synthesized using various algorithms. 

I3J we consider two cases. First, in the OPT T line, we only look at circuits satisfying the restric- 
tion mentioned above. Then, in the OPT line, we relax this restriction and give the circuit size 
distribution for optimal circuits.^ 

Most functions computable by a 2-bit ROM-based circuit actually require two writeable bits 
1211 . Whether or not a given function can be computed by a 1-bit ROM -based CNT-circuit, can 
be determined by the following constructive procedure. Observe that gates in 1-bit ROM circuits 
can be reordered arbitrarily, as no gate affects the control bits of any other gate. Thus, whether 
or not a C or T gate flips the controlled bit, depends only on the circuit inputs. Furthermore, 
multiple copies of the same gate on the same wires cancel out, and we can assume that at most 
one is present in an optimal circuit. A synthesis procedure can then check which gates are present 
by applying the permutation on every possible input combination with zero, one, or two Is in its 
binary expansion. (Again, we have relaxed the restriction that only 1 control may be on a ROM 
wire). If the value of the function is 1, the circuit needs an N, C or T gate controlled by those bits. 

Observe that adding the S gate to the gate library during k + I ROM synthesis will never 
decrease circuit sizes — no two wires can be swapped since at least one of them is a ROM wire. 
In the case of ^ + 2 ROM synthesis, only the two non-ROM wires can be swapped, and one of 
them must be returned to its initial value by the end of the computation. We ran an experiment 
comparing circuit lengths in the 3+2 ROM-based case and found no improvement in circuit sizes 
upon adding the S gate, but we have been unable to prove this in the general case. 

^Using a circuit library with < 6 gates (191Mb file, 1.5 min to generate), the OPT line takes 5 min to generate. The 
use of a 5-gate library improved the runtimes by at least 2x if we do not synthesize the only circuit of size 1 1 . For the OPT 
T line, we first find the 250 optimal circuits of size < 12 (15 min) using a 6-gate library (61Mb, 5min). The remaining 6 
functions were synthesized in 5 min with a 7-gate Ubrary (376Mb, 10 min). This required more than 1Gb of RAM. 
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6 Conclusions 



We have explored a number of promising techniques for synthesizing optimal and near-optimal 
reversible circuits that require little or no temporary storage. In particular, we have proven that ev- 
ery even permutation function can be synthesized without temporary storage using the CNT gate 
library. Similarly, any permutation, even or odd, can be synthesized with up to one bit of tem- 
porary storage. We have recently discovered that A. DeVos has independently demonstrated this 
result, however, his proof relies on non-trivial group-theoretic notions and resorts to a computer 
algebra package for a special case. ||5l We give a much more elementary analysis, and moreover 
our proof techniques are sufficiently constructive to be interpreted as a synthesis heuristic. We 
have also derived various equivalences among CNT-circuits that are useful for synthesis purposes, 
and given a decomposition of a CNT-circuit into a T|C|T|N-circuit. 

To further investigate the structure of reversible circuits, we developed a method for syn- 
thesizing optimal reversible circuits. While this algorithm scales better than its counterparts for 
irreversible computation fTV\, its runtime is still exponential. Nonetheless, it can be used to study 
small problems in detail, which may be of interest to physicists building quantum computing de- 
vices because the current state of the art is largely limited by 10 qubits. One might think that an 
exhaustive search procedure would suffice for small problems, but in fact, even for three-input 
circuits, an exhaustive search is nowhere near finished after 15 hours; our procedure terminates in 
minutes. Our experimental data about all optimal reversible circuits on three wires using various 
subsets of the CNTS library reveal some interesting characteristics of optimal reversible circuits. 
Such statistics, extrapolated to larger circuits, can be used in the future to guide heuristics, and 
may suggest new theorems about reversible circuits. 

Finally, we have applied our optimal synthesis tool to the design of oracle circuits for a key 
quantum computing application, Grover's search algorithm, and obtained much smaller circuits 
than previous methods. Ultimately, we aim to extend the proposed methods to handle larger and 
more general circuits, with the eventual goal of synthesizing quantum circuits containing dozens 
of qubits. 
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